continuous-time stochastic gradient descent
Convergence of continuous-time stochastic gradient descent with applications to linear deep neural networks
Lugosi, Gabor, Nualart, Eulalia
We study a continuous-time approximation of the stochastic gradient descent process for minimizing the expected loss in learning problems. The main results establish general sufficient conditions for the convergence, extending the results of Chatterjee (2022) established for (nonstochastic) gradient descent. We show how the main result can be applied to the case of overparametrized linear neural network training.
Continuous-time stochastic gradient descent for optimizing over the stationary distribution of stochastic differential equations
Wang, Ziheng, Sirignano, Justin
We develop a new continuous-time stochastic gradient descent method for optimizing over the stationary distribution of stochastic differential equation (SDE) models. The algorithm continuously updates the SDE model's parameters using an estimate for the gradient of the stationary distribution. The gradient estimate is simultaneously updated, asymptotically converging to the direction of steepest descent. We rigorously prove convergence of our online algorithm for linear SDE models and present numerical results for nonlinear examples. The proof requires analysis of the fluctuations of the parameter evolution around the direction of steepest descent. Bounds on the fluctuations are challenging to obtain due to the online nature of the algorithm (e.g., the stationary distribution will continuously change as the parameters change). We prove bounds for the solutions of a new class of Poisson partial differential equations, which are then used to analyze the parameter fluctuations in the algorithm.
Stochastic Gradient Descent in Continuous Time
Sirignano, Justin, Spiliopoulos, Konstantinos
Stochastic gradient descent in continuous time (SGDCT) provides a computationally efficient method for the statistical learning of continuous-time models, which are widely used in science, engineering, and finance. The SGDCT algorithm follows a (noisy) descent direction along a continuous stream of data. SGDCT performs an online parameter update in continuous time, with the parameter updates $\theta_t$ satisfying a stochastic differential equation. We prove that $\lim_{t \rightarrow \infty} \nabla \bar g(\theta_t) = 0$ where $\bar g$ is a natural objective function for the estimation of the continuous-time dynamics. The convergence proof leverages ergodicity by using an appropriate Poisson equation to help describe the evolution of the parameters for large times. SGDCT can also be used to solve continuous-time optimization problems, such as American options. For certain continuous-time problems, SGDCT has some promising advantages compared to a traditional stochastic gradient descent algorithm. As an example application, SGDCT is combined with a deep neural network to price high-dimensional American options (up to 100 dimensions).